Goto

Collaborating Authors

 andrew gelman



Amortized Bayesian Workflow (Extended Abstract)

arXiv.org Machine Learning

Bayesian inference often faces a trade-off between computational speed and sampling accuracy. We propose an adaptive workflow that integrates rapid amortized inference with gold-standard MCMC techniques to achieve both speed and accuracy when performing inference on many observed datasets. Our approach uses principled diagnostics to guide the choice of inference method for each dataset, moving along the Pareto front from fast amortized sampling to slower but guaranteed-accurate MCMC when necessary. By reusing computations across steps, our workflow creates synergies between amortized and MCMC-based inference. We demonstrate the effectiveness of this integrated approach on a generalized extreme value task with 1000 observed data sets, showing 90x time efficiency gains while maintaining high posterior quality.


An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations

arXiv.org Machine Learning

It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. The central observation is that it is possible to estimate a symmetric KL-divergence defined over these simulations.


Essential Resources to Learn Bayesian Statistics - KDnuggets

#artificialintelligence

In this post, I summarize a series of resources to get started with Bayesian Statistics. I compiled these references based on my experience and opinion as to what a good introduction and next steps are in this process. This is not an academic curriculum or anything tremendously rigorous, but it is a comprehensive list that will surely get you embarked on the journey to revisiting/starting your statistics. Many of the references below were recommended to me in several workshops I've attended, and I want to share with those like me that want to be better at statistics and Machine Learning (ML). The first resource I can think of out there for beginners interested in Bayesian statistics and modeling is Richard McElreath's Statistical Rethinking.


Correcting Predictions for Approximate Bayesian Inference

arXiv.org Machine Learning

Bayesian models quantify uncertainty and facilitate optimal decision-making in downstream applications. For most models, however, practitioners are forced to use approximate inference techniques that lead to sub-optimal decisions due to incorrect posterior predictive distributions. We present a novel approach that corrects for inaccuracies in posterior inference by altering the decision-making process. We train a separate model to make optimal decisions under the approximate posterior, combining interpretable Bayesian modeling with optimization of direct predictive accuracy in a principled fashion. The solution is generally applicable as a plug-in module for predictive decision-making for arbitrary probabilistic programs, irrespective of the posterior inference strategy. We demonstrate the approach empirically in several problems, confirming its potential.


KDnuggets โ€“ Favorite Data Science / Machine Learning Blog

#artificialintelligence

Thanks to Bob E. Hayes recent tweet I found his blog on Favorite Data Science and Machine Learning Blogs, which in turn was based on Kaggle ML and Data Science Survey 2017 results. The question answered by 8140 Kagglers was What are your top 3 favorite data science blogs/podcasts/newsletters? (Select up to three options) and here are their answers (a total of 21 blogs/podcasts/newsletters): Figure 1: Kaggle 2017 Survey Favorite Data Science Blogs/Podcasts/Newsletters Excluding "Other", here are the top 10 blogs and the percentage of Kagglers who selected them. The most favorite podcast was Becoming a Data Scientist, with 16.0% share. The next question in Kaggle survey compared favorite blogs/podcasts/newsletters among Data Scientists who are employed and those looking for work. KDnuggets was No. 1 in both categories! Figure 1: Kaggle 2017 Survey: Top 10 Data Science Blogs/Podcasts/Newsletters among employed data scientists and those that are looking for a job.


Stan Biweekly Roundup, 6 October 2017

#artificialintelligence

Jonah Gabry returned from teaching a one-week course for a special EU research institute in Spain. Mitzi Morris has been knocking out bug fixes for the parser and some pull requests to refactor the underlying type inference to clear the way for tuples, sparse matrices, and higher-order functions. Michael Betancourt with help from Sean Talts spent last week teaching an intro course to physicists about Stan. Charles Margossian attended and said it went really well. Ben Goodrich, in addition to handling a slew of RStan issues has been diving into the math library to define derivatives for Bessel functions. Aki Vehtari has put us in touch with the MxNet developers at Amazon UK and we had our first conference call with them to talk about adding sparse matrix functionality to Stan (Neil Lawrence is working there now).


Bayesian Basics, Explained

@machinelearnbot

Editor's note: The following is an interview with Columbia University Professor Andrew Gelman conducted by Marketing scientist Kevin Gray, in which Gelman spells out the ABCs of Bayesian statistics. Kevin Gray: Most marketing researchers have heard of Bayesian statistics but know little about it. Can you briefly explain in layperson's terms what it is and how it differs from the'ordinary' statistics most of us learned in college? Andrew Gelman: Bayesian statistics uses the mathematical rules of probability to combines data with "prior information" to give inferences which (if the model being used is correct) are more precise than would be obtained by either source of information alone. Classical statistical methods avoid prior distributions.


Bayesian Basics, Explained

#artificialintelligence

Editor's note: The following is an interview with Columbia University Professor Andrew Gelman conducted by Marketing scientist Kevin Gray, in which Gelman spells out the ABCs of Bayesian statistics. Andrew Gelman: Bayesian statistics uses the mathematical rules of probability to combines data with "prior information" to give inferences which (if the model being used is correct) are more precise than would be obtained by either source of information alone. Classical statistical methods avoid prior distributions. In classical statistics, you might include in your model a predictor (for example), or you might exclude it, or you might pool it as part of some larger set of predictors in order to get a more stable estimate. These are pretty much your only choices.


100 Blogs on Analytics, Big Data, Data Science, and Machine Learning

@machinelearnbot

We've added some blogs that were missing in the original list, and eliminated some that aren't worth mentioning, hoping to make this list less biased. AnalyticBridge, about advanced analytics, books, salary surveys, training, challenges. Anil Batra's Web Analysis (Analytics), Online Advertising and Behavioral Targeting blog BigDataNews General articles about big data, as well as news (selected press releases) Business. CoolData By Kevin MacDonell on Analytics, predictive modeling and related cool data stuff for fund-raising in higher education. Cloud of data blog By Paul Miller, aims to help clients understand the implications of taking data and more to the Cloud.